Evaluating Decision Trees Grown with Asymmetric Entropies
نویسندگان
چکیده
We propose to evaluate the quality of decision trees grown on imbalanced datasets with a splitting criterion based on an asymmetric entropy measure. To deal with the class imbalance problem in machine learning, especially with decision trees, different authors proposed such asymmetric splitting criteria. After the tree is grown a decision rule has to be assigned to each leaf. The classical Bayesian rule that selects the more frequent class is irrelevant when the dataset is strongly imbalanced. A best suited assignment rule taking asymmetry into account must be adopted. But how can we then evaluate the resulting prediction model? Indeed the usual error rate is irrelevant when the classes are strongly imbalanced. Appropriate evaluation measures are required in such cases. We consider ROC curves and recall/precision graphs for evaluating the performance of decision trees grown from imbalanced datasets. These evaluation criteria are used for comparing trees obtained with an asymmetric splitting criterion with those grown with a symmetric one. In this paper we only consider the cases involving 2 classes.
منابع مشابه
A Comparison of Different Off-Centered Entropies to Deal with Class Imbalance for Decision Trees
In data mining, large differences in prior class probabilities known as the class imbalance problem have been reported to hinder the performance of classifiers such as decision trees. Dealing with imbalanced and cost-sensitive data has been recognized as one of the 10 most challenging problems in data mining research. In decision trees learning, many measures are based on the concept of Shannon...
متن کاملComparison of Shannon, Renyi and Tsallis Entropy Used in Decision Trees
Shannon entropy used in standard top-down decision trees does not guarantee the best generalization. Split criteria based on generalized entropies offer different compromise between purity of nodes and overall information gain. Modified C4.5 decision trees based on Tsallis and Renyi entropies have been tested on several high-dimensional microarray datasets with interesting results. This approac...
متن کاملEvaluating Asymmetric Decision Problems with Binary Constraint Trees
This paper proposes the use of binary trees in order to represent and evaluate asymmetric decision problems with Influence Diagrams (IDs). Constraint rules are used to represent the asymmetries between the variables of the ID. These rules and the potentials involved in IDs will be represented using binary trees. The application of these rules can reduce the size of the potentials of the ID. As ...
متن کاملUsing Local Node Information in Decision Trees: Coupling a Local Labeling Rule with an Off-centered Entropy
Dealing with skewed class distribution and costsensitive data has been recognized as one of the 10 most challenging problems in data mining research. These problems have been reported to hinder the performance of classifiers, especially on the minority class. To deal with this problem in decision tree induction we proposed an off-centered entropy while other authors proposed an asymmetric entro...
متن کاملProbabilistic analysis of the asymmetric digital search trees
In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008